TD(λ) and Q-learning based Ludo players
نویسندگان
چکیده
منابع مشابه
Double Q($\sigma$) and Q($\sigma, \lambda$): Unifying Reinforcement Learning Control Algorithms
Temporal-difference (TD) learning is an important field in reinforcement learning. Sarsa and Q-Learning are among the most used TD algorithms. The Q(σ) algorithm (Sutton and Barto (2017)) unifies both. This paper extends the Q(σ) algorithm to an online multi-step algorithm Q(σ, λ) using eligibility traces and introduces Double Q(σ) as the extension of Q(σ) to double learning. Experiments sugges...
متن کاملA Counterexample to Temporal Differences Learning
Sutton’s TD(λ) method aims to provide a representation of the cost function in an absorbing Markov chain with transition costs. A simple example is given where the representation obtained depends on λ. For λ = 1 the representation is optimal with respect to a least squares error criterion, but as λ decreases towards 0 the representation becomes progressively worse and, in some cases, very poor....
متن کاملTwo Novel On-policy Reinforcement Learning Algorithms based on TD(lambda)-methods
This paper describes two novel on-policy reinforcement learning algorithms, named QV(λ)-learning and the actor critic learning automaton (ACLA). Both algorithms learn a state value-function using TD(λ)-methods. The difference between the algorithms is that QV-learning uses the learned value function and a form of Q-learning to learn Q-values, whereas ACLA uses the value function and a learning ...
متن کاملLearning Team Strategies with Multiple Policy-sharing Agents: a Soccer Case Study
We use simulated soccer to study multiagent learning. Each team's players (agents) share action set and policy but may behave diierently due to position-dependent inputs. All agents making up a team are rewarded or punished collectively in case of goals. We conduct simulations with varying team sizes, and compare two learning algorithms: TD-Q learning with linear neural networks (TD-Q) and Prob...
متن کاملDifferential Eligibility Vectors for Advantage Updating and Gradient Methods
In this paper we propose differential eligibility vectors (DEV) for temporal-difference (TD) learning, a new class of eligibility vectors designed to bring out the contribution of each action in the TD-error at each state. Specifically, we use DEV in TD-Q(λ) to more accurately learn the relative value of the actions, rather than their absolute value. We identify conditions that ensure convergen...
متن کامل